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(54) Unified messaging system with automatic language Identifacation for text-to-speech 
conversion 



(57) A unified messaging system includes a voice 
gateway server coupled to an electronic mail system 
and a private branch exchange (PBX). The voice gate- 
way server provides voice messaging services to a set 
of subscribers. Within the voice gateway server, a tri- 
graph analyzer sequentially examines 3 -character com- 
binations within a text message; determines occurrence 
frequencies for the character combinations; compares 
the occurrence frequencies with reference occurrence 



statistics modeled from text samples written in particu- 
lar languages; and generates a language identifier and 
a likelihood value for the text message. Based upon the 
language identifier, a message inquiry unit selects an 
appropriate text-to-speech engine for converting the 
text message into computer-generated speech that is 
played to a subscriber. 
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Description 

CROSS-REFERENCE TO RELATED PUBLICATIONS 

The present invention is related to and incorporates 
by reference U.S. Patent No. 5.557,659, entitled "Elec- 
tronic Mail System Having Integrated Voice Messages." 

BACKGROUND OF THE INVENTION 

1.1 Field of the Invention 

The present invention relates to systems and meth- 
ods for voice and text messaging, as well as systems 
and methods for language recognition. More particu- 
larly, the present invention is a unified messaging sys- 
tem that automatically identifies a language associated 
with a text message, and performs an appropriate text- 
to-speech conversion. 

1 .2 Descrip tion of the Background Art 

Computer-based techniques for converting text into 
speech have become well-known in recent years. Via 
such techniques, textual data is translated to audio 
information by a text-to-speech conversion "engine," 
which most commonly comprises software. Examples 
of text-to-speech software include Apple Computer's 
Speech Manager (Apple Computer Corporation, Cuper- 
tino, CA), and Digital Equipment Corporation's DECTalk 
(Digital Equipment Corporation, Cambridge, MA). In 
addition to converting textual data into speech, such 
software is responsive to user commands for controlling 
volume, pitch, rate, and other speech-related parame- 
ters. 

A text-to-speech engine generally comprises a text 
analyzer, a syntax and context analyzer, and a synthe- 
sis module. The text analyzer, in conjunction with the 
syntax and context analyzer, 

utilizes a rule-based index to identify fundamental gram- 
matical units within textual data. The fundamental gram- 
matical units are typically word and/or phoneme-based, 
and the rule-based index is correspondingly referred to 
as a phoneme library. Those skilled in the art will under- 
stand that the phoneme library typically includes a 
word-based dictionary for the conversion of ortho- 
graphic data into a phonemic representation. The syn- 
thesis module either assembles or generates speech 
sequences corresponding to the identified fundamental 
grammatical units, and plays the speech sequences to a 
listener. 

Text-to-speech conversion can be very useful within 
the context of unified or integrated messaging systems. 
In such messaging systems, a voice processing server 
is coupled to an electronic mail system, such that a 
user's e-mail in-box provides message notification as 
well as access to messaging services for e-mail mes- 
sages, voice messages, and possibly other types of 



messages such as faxes. An example of a unified mes- 
saging system is Octel's Unified Messenger (Octel 
Communications Corporation, Milpitas, CA). Such sys- 
tems selectively translate an e-mail message into 

5 speech through the use of text-to-speech conversion. A 
user calling from a remote telephone can therefore 
readily listen to both voice and e-mail messages. Thus, 
a unified messaging system employing text-to-speech 
conversion eliminates the need for a user to have direct 

10 access to their computer during message retrieval oper- 
ations. 

In many situations, messaging system users can 
expect to receive textual messages written in different 
languages. For example, a person conducting business 

75 in Europe might receive e-mail messages written in 
English, French, or German. To successfully convert 
text into speech within the context of a particular lan- 
guage requires a text-to-speech engine designed for 
that language. Thus, to successfully convert French text 

20 into spoken French requires a text-to-speech engine 
designed for the French language, including a French- 
specific phoneme library. Attempting to convert French 
text into spoken language through the use of an English 
text-to-speech engine would likely produce a large 

25 amount of unintelligible output. 

In the prior art, messaging systems rely upon a 
human reader to specify a given text-to-speech engine 
to be used in converting a message into speech. Alter- 
natively, some systems enable a message originator to 

30 specify a language identification code that is sent with 
the message. Both approaches are inefficient and 
inconvenient. What is needed is a messaging system 
providing automatic written language identification as a 
prelude to text-to-speech conversion. 

35 

SUMMARY OF THE INVENTION 

The present invention is a unified messaging sys- 
tem providing automatic language identification for the 

40 conversion of textual messages into speech. The uni- 
fied messaging system comprises a voice gateway 
server coupled to a computer network and a Private 
Branch Exchange (PBX). The computer network 
includes a plurality of computers coupled to a file server, 

45 through which computer users identified in an electronic 
mail (e-mail) directory exchange messages. The voice 
gateway server facilitates the exchange of messages 
between computer users and a telephone system, and 
additionally provides voice messaging services to sub- 

so scribers, each of whom is preferably a computer user 
identified in the e-mail directory. 

The voice gateway server preferably comprises a 
voice board, a network interface unit, a processing unit, 
a data storage unit, and a memory wherein a set of 

55 voice messaging application units; a message buffer; a 
plurality of text-to-speech engines and corresponding 
phoneme libraries; a trigraph analyzer; and a set of 
corecurrence libraries reside. Each voice messaging 
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application unit comprises program instructions for pro- 
viding voice messaging functions such as call answer- 
ing, automated attendant, and message store/forward 
operations to voice messaging subscribers. 

A message inquiry unit directs message playback 
operations. In response to a subscriber's issuance of a 
voice message review request, the message inquiry unit 
plays the subscriber's voice messages in a conventional 
manner. In response to a text message review request, 
the message inquiry unit initiates automatic language 
identification operations, followed by a text-to-speech 
conversion performed in accordance with the results of 
the language identification operations. 

The trigraph analyzer examines a text sequence, 
and performs language identification operations by first 
determining the occurrence frequencies of sequential 3- 
character combinations within the text, and then com- 
paring the determined occurrence frequencies with ref- 
erence occurrence statistics for various languages. The 
set of reference occurrence statistics associated with a 
given language are stored together as a corecurrence 
library. The trigraph analyzer determines a closest 
match between the determined occurrence frequencies 
and a particular corecurrence library, and returns a cor- 
responding language identifier and likelihood value to 
the message inquiry unit. 

The message inquiry unit subsequently selects a 
text-to-speech engine and an associated phoneme 
library, and initiates the conversion of the text message 
into computer-generated speech that is played to the 
subscriber in a conventional manner. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram of a preferred embodi- 
ment of a unified messaging system constructed in 
accordance with the present invention; 
Figure 2 is a block diagram of a first and preferred 
embodiment of a voice server constructed in 
accordance with the present invention; 
Figure 3 is a flowchart of a first and preferred 
method for providing automatic language identifica- 
tion for text-to-speech conversion in the present 
invention; 

Figure 4 is a block diagram of a second embodi- 
ment of a voice server constructed in accordance 
with the present invention; and 
Figure 5 is a flowchart of a second method for pro- 
viding automatic language identification for text-to- 
speech conversion in the present invention. 

DESCRIPTION OF THE PREFERRED EMBODI- 
MENTS 

Referring now to Figure 1 , a block diagram of a pre- 
ferred embodiment of a unified messaging system 100 
constructed in accordance with the present invention is 
shown. The unified messaging system 100 comprises a 



set of telephones 110, 112, 114 coupled to a Private 
Branch Exchange (PBX) 120; a computer network 130 
comprising a plurality of computers 1 32 coupled to a file 
server 134 via a network line 136, where the file server 

5 1 34 is additionally coupled to a data storage device 1 38; 
and a voice gateway server 140 that is coupled to the 
network line 136, and coupled to the PBX 120 via a set 
of telephone lines 142 as well as an integration link 144. 
The PBX 120 is further coupled to a telephone network 

w via a collection of trunks 122, 124, 126. The unified 
messaging system 100 shown in Figure 1 is equivalent 
to that described in U.S. Patent No. 5,557,659, entitled 
"Electronic Mail System Having Integrated Voice Mes- 
sages," which is incorporated herein by reference. 

15 Those skilled in the art will recognize that the teachings 
of the present invention are applicable to essentially any 
unified or integrated messaging environment. 

In the present invention, conventional software exe- 
cuting upon the computer network 130 provides file 

20 transfer services, group access to software applica- 
tions, as well as an electronic mail (e-mail) system 
through which computer users can transfer messages 
as well as message attachments between their comput- 
ers 132 via the file server 134. In an exemplary embod- 

25 iment, Microsoft Exchange™ software (Microsoft 
Corporation, Redmond, WA) executes upon the compu- 
ter network 130 to provide such functionality. Within the 
file server 134, an e-mail directory associates each 
computer user's name with a message storage location, 

30 or "in-box," and a network address, in a manner that will 
be readily understood by those skilled in the art. The 
voice gateway server 140 facilitates the exchange of 
messages between the computer network 130 and a 
telephone system. Additionally, the voice gateway 

35 server 140 provides voice messaging services such as 
call answering, automated attendant, voice message 
store and forward, and message inquiry operations to 
voice messaging subscribers. In the preferred embodi- 
ment, each subscriber is a computer user identified in 

40 the e-mail directory, that is, having a computer 132 cou- 
pled to the network 130. Those skilled in the art will rec- 
ognize that in an alternate embodiment, the voice 
messaging subscribers could be a subset of computer 
users. In yet another alternate embodiment, the compu- 

45 ter users could be a subset of a larger pool of voice 
messaging subscribers, which might be useful when the 
voice gateway server is primarily used for call answer- 
ing. 

Referring also now to Figure 2, a block diagram of a 
so first and preferred embodiment of a voice gateway 
server 140 constructed in accordance with the present 
invention is shown. In the preferred embodiment, the 
voice gateway server 140 comprises a voice board 200, 
a network interface unit 202, a processing unit 204, a 
55 data storage unit 206, and a memory 210 wherein a plu- 
rality of voice messaging application units 220, 222, 
224, 226; a message buffer 230; a set of text-to-speech 
engines 242, 243, 244 and corresponding phoneme 
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libraries 252, 253, 254; a trigraph analyzer 260; and a 
plurality of corecurrence libraries 272, 273, 274, 275, 
276 reside. Each element within the voice gateway 
server 140 is coupled to a common bus 299. The net- 
work interface unit 202 is additionally coupled to the net- 
work line 136, and the voice board 200 is coupled to the 
PBX 120. 

The voice board 200 preferably comprises conven- 
tional circuitry that interfaces a computer system with 
telephone switching equipment, and provides telephony 
and voice processing functions. The network interface 
unit 202 preferably comprises conventional circuitry that 
manages data transfers between the voice gateway 
server 140 and the computer network 130. In the pre- 
ferred embodiment, the processing unit 204 and the 
data storage unit 206 are also conventional. 

The voice messaging application units 220, 222, 
224, 226 provide voice messaging services to subscrib- 
ers, including call answering, automated attendant, and 
voice message store and forward operations. A mes- 
sage inquiry unit 226 directs telephone-based message 
playback operations in response to a subscriber 
request. In response to a voice message review 
request, the message inquiry unit 226 initiates the 
retrieval of a voice message associated with the sub- 
scriber's in-box, followed by the playing of the voice 
message to the user via the telephone in a conventional 
manner. In response to a text message review request, 
the message inquiry unit 226 initiates retrieval of a text 
message associated with the subscriber's in-box, fol- 
lowed by automatic language recognition and text-to- 
speech conversion operations, as described in detail 
below with reference to Figure 3. In the preferred 
embodiment, each voice messaging application unit 
220, 222, 224, 226 comprises program instruction 
sequences that are executable by the processing unit 
204. 

The message buffer 230 comprises a portion of the 
memory 200 reserved for temporarily storing messages 
before or after message exchange with the file server 
134. The text-to-speech engines 242, 243, 244, 245, 
246 preferably comprise conventional software for 
translating textual data into speech. Those skilled in the 
art will readily understand that in an alternate embodi- 
ment, one or more portions of a text-to-speech engine 
242, 243, 244, 245, 246 could be implemented using 
hardware. 

The number of text-to-speech engines 242, 243, 
244 resident within the memory 21 0 at any given time is 
determined according to the language environment in 
which the present invention is employed. In the pre- 
ferred embodiment, the memory 210 includes a text-to- 
speech engine 242, 243, 244 for each language within a 
group of most-commonly expected languages. Addi- 
tional text-to-speech engines 245, 246 preferably reside 
upon the data storage unit 206, and are loaded into the 
memory 210 when text-to-speech conversion for a lan- 
guage outside the aforementioned group is required, as 



described in detail below. In an exemplary embodiment, 
text-to-speech engines 242, 243, 244 corresponding to 
English, French, and German reside within the memory 
210, while text-to-speech engines 245, 246 for Portu- 

5 guese. Italian, and/or other languages reside upon the 
data storage unit 206. Those skilled in the art will recog- 
nize that in an alternate embodiment, the number of 
text-to-speech engines 242, 243, 244 resident within 
the memory could be determined according to a mem- 

w ory management technique, such as virtual memory 
methods, where text-to-speech engines 242, 243, 244 
are conventionally swapped out to the data storage unit 
206 as required. 

The memory 210 preferably includes a conven- 

75 tional phoneme library 252, 253, 254 corresponding to 
each text-to-speech engine 242, 243, 244 residing 
therein. In the preferred embodiment, a phoneme library 
255, 256 also resides upon the data storage unit 206 for 
each text-to-speech engine 245, 246 stored thereupon. 

20 The present invention preferably relies upon n- 
graph methods for textual language identification, in 
particular, techniques developed by Clive Souter and 
Gavin Churcher at the University of Leeds in the United 
Kingdom, as reported in 1) "Bigram and Trigram Models 

25 for Language Identification and Classification," Pro- 
ceedings of the AISB Workshop on Computational Lin- 
guistics for Speech and Handwriting Recognition, 
University of Leeds, 1994; 2) "Natural Language Identi- 
fication Using Corpus-Based Models," Hermes Journal 

30 of Linguistics 13. 183-204, 1994; and 3) "N-gram Tools 
for Generic Symbol Processing," M. Sc. Thesis of Phil 
Cave, School of Computer Studies, University of Leeds, 
1995. 

In n-graph language identification, the occurrence 
35 frequencies of successive n-character combinations 
within a textual message are compared with reference 
n-character occurrence statistics associated with partic- 
ular languages. The reference statistics for any given 
language are automatically derived or modeled from 
40 text samples taken from that language. Herein, the ref- 
erence n-character occurrence statistics for a given lan- 
guage are stored together as a corecurrence library 

272, 273, 274, 275. 276. 

The present invention preferably employs the tri- 
45 graph analyzer 260 and co-recurrence libraries 272, 

273, 274, 275, 276 to perform trigraph-based language 
identification, that is, language identification based 
upon the statistical occurrences of three-letter combina- 
tions. In the preferred embodiment, the memory 210 

so includes a corecurrence library 272, 273, 274, 275, 276 
corresponding to each text-to-speech engine 242, 243, 
244 within the memory 210 as well as each text-to- 
speech engine 245. 246 stored upon the data storage 
device 206. 

55 The trigraph analyzer 260 returns a language iden- 
tifier and a likelihood or percentage value that indicates 
relative language identification certainty. As developed 
at the University of Leeds, the trigraph analyzer 260 is 
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approximately 100% accurate when textual input com- 
prises at least 175 characters. The trigraph analyzer 
260 additionally maintains high language identification 
accuracy, typically greater than 90%, for shorter-length 
text sequences. 

In an exemplary embodiment, the voice gateway 
server 140 is a personal computer having a 200 MHz 
Intel Pentium™ Processor (Intel Corporation, Santa 
Clara, CA); 128 Megabyes of Random Access Memory 
(RAM); an Ethernet-based network interface unit 202; a 
Redundant Array of Inexpensive Disks (RAID) drive 
serving as the data storage unit 206; a Rhetorex voice 
board (Rhetorex Corporation, San Jose, CA); DECTalk 
text-to-speech engines 242, 243, 244, 245, 246 and 
corresponding phoneme libraries 252, 253, 254, 255. 
256 (Digital Equipment Corporation, Cambridge, MA); 
the aforementioned trigraph analyzer 260 and associ- 
ated corecurrence libraries 272, 273, 274, 275, 276 
developed at the University of Leeds; and voice mes- 
saging application units 220, 222, 224, 226 imple- 
mented using Octet's Unified Messenger software 
(Octel Communications Corporation, Milpitas, CA). 

Referring now to Figure 3, a f towchart of a first and 
preferred method for providing automatic language 
identification for text-to-speech conversion is shown. 
The preferred method begins in step 300 in response to 
a subscriber's issuance of a text message review 
request, with the message inquiry unit 226 retrieving a 
text message from the subscriber's in-box, or from a 
particular data file or folder as specified by the sub- 
scriber. In the preferred embodiment, the subscriber's 
in-box corresponds to a file server storage location, and 
the retrieved text message is transferred to the mes- 
sage buffer 230. Following step 300, the message 
inquiry unit 226 issues an identification directive to the 
trigraph analyzer 260 in step 302, thereby initiating lan- 
guage identification. 

In response to the identification directive, the tri- 
graph analyzer 260 examines successive 3-character 
combinations within the text message currently under 
consideration, and determines occurrence frequencies 
for the character combinations in step 304. In the pre- 
ferred embodiment, the trigraph analyzer 260 examines 
the first 1 75 characters of the text message in the event 
that the text message is sufficiently long; otherwise, the 
trigraph analyzer 260 examines the longest character 
sequence possible. 

Following the determination of the occurrence fre- 
quencies for the current text message, the trigraph ana- 
lyzer 260 compares the occurrence frequencies with the 
reference occurrence statistics in each corecurrence 
library 272, 273, 274, 275, 276 and determines a clos- 
est match with a particular corecurrence library 272, 
273, 274, 275 in step 308. Upon determining the closest 
match, the trigraph analyzer 260 returns a language 
identifier and an associated likelihood value to the mes- 
sage inquiry unit 226 in step 310. Those skilled in the 
art will recognize that the trigraph analyzer 260 could 



return a set of language identifiers and a likelihood 
value corresponding to each language identifier in an 
alternate embodiment. 

As long as the text message is written in a language 

5 corresponding to one of the corecurrence libraries 272, 
273, 274, 275, 276. the correlation between the occur- 
rence frequencies and the reference occurrence statis- 
tics is likely to be sufficient for successful language 
identification, rf the text message is written in a lan- 

10 guage that does not correspond to any of the corecur- 
rence libraries 272, 273, 274, 275, 276 present, the 
correlation will be poor, and a closest match cannot be 
determined. In the event that the likelihood value 
returned by trigraph analyzer 260 is below a minimum 

15 acceptable threshold (for example, 20%), the message 
inquiry unit 226 plays a corresponding prerecorded 
message to the subscriber via steps 312 and 318. An 
exemplary prerecorded message could be "language 
identification unsuccessful." 

20 Upon receiving the language identifier and an 
acceptable likelihood value, the message inquiry unit 
226 selects the appropriate text-to-speech engine 242, 
243, 244, 245, 246 in step 31 4. In the event that the text- 
to-speech engine 244, 245 and its associated phoneme 

25 library 254, 255 do not presently reside within the mem- 
ory 210, the message inquiry unit 226 transfers the 
required text-to-speech engine 244, 245 and the corre- 
sponding phoneme library 254, 255 from the data stor- 
age unit 206 into the memory 210. 

30 After step 314, the message inquiry unit 226 issues 
a conversion directive to the selected text-to-speech 
engine 242, 243, 244, 245, 246 in step 316, following 
which the text message currently under consideration is 
converted to speech and played to the subscriber in a 

35 conventional manner. Upon completion of step 316. the 
message inquiry unit 226 determines whether another 
text message in the subscriber's in-box, or as specified 
by the subscriber, requires consideration in step 320. If 
so, the preferred method proceeds to step 300; other- 

40 wise, the preferred method ends. 

In an alternate embodiment, steps 312 and 318 
could be omitted, such that step 310 directly proceeds 
to step 314 to produce a "best guess" text-to-speech 
conversion played to the subscriber. In such an alter- 

45 nate embodiment, the message inquiry unit 226 could 
1) disregard the likelihood value; or 2) select the lan- 
guage identifier associated with a best likelihood value 
in the event that multiple language identifiers and likeli- 
hood values are returned. 

50 In the preferred embodiment, textual language 
identification is performed, followed by text-to-speech 
conversion in the appropriate language. This results in 
the subscriber listening to computer-generated speech 
that matches the language in which the original text 

55 message was written. In an alternate embodiment, tex- 
tual language identification could be performed, fol- 
lowed by text-to-text language conversion (i.e., 
translation), followed by text-to-speech conversion such 
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that the subscriber listens to computer generated 
speech in a language with which the subscriber is most- 
comfortable. To facilitate this alternate embodiment, a 
set of subscriber language preference selections are 
stored as user-configuration data within a subscriber 
information database or directory. The subscriber infor- 
mation database could reside within the voice gateway 
server 140, or it could be implemented in association 
with the file server's e-mail directory in a manner those 
skilled in the art will readily understand. Additionally, the 
voice gateway server 140 is modified to include addi- 
tional elements, as described in detail hereafter. 

Referring now to Figure 4, a block diagram of a sec- 
ond embodiment of a voice gateway server 141 con- 
structed in accordance with the present invention is 
shown. Elements common to both Figures 2 and 4 are 
numbered alike for ease of understanding. In addition to 
having the elements shown in Figure 2, the second 
embodiment of the voice gateway server 141 includes a 
set of conventional text translators 282, 283, 284, 285, 
286, each having an associated word dictionary 292, 

293, 294, 295, 296. Those skilled in the art will under- 
stand that the word dictionaries 292, 293, 294, 295. 296 
are distinct from (i.e., not equivalent to) the phoneme 
libraries 252, 253, 254, 255, 256 in content and manner 
of use, and that each text translator 282, 283. 284, 285, 
286 corresponds to a particular target language availa- 
ble for subscriber selection. Text translators 282, 283, 
284 and word dictionaries 292, 293, 294 corresponding 
to most-common subscriber preference selections 
reside within the memory 210, while those for less-fre- 
quently selected languages reside upon the data stor- 
age device 206, to be transferred into the memory 210 
as required. Those skilled in the art will also understand 
that in an alternate embodiment, the text translators 
282, 283, 284, 285, 286 and corresponding word dic- 
tionaries 292, 293, 294, 295, 296 could normally reside 
upon the data storage device 206, to be swapped into or 
out of the memory 210 as required during system oper- 
ation. In an exemplary embodiment, the text translators 
282, 283, 284, 285, 286 and word dictionaries 292, 293, 

294, 295, 296 could be implemented using commer- 
cially-available software such as that provided by Trans- 
lation Experts, Ltd. of London, England; or Language 
Partners International of Evanston, IL. 

Referring now to Figure 5, a flowchart of a second 
method for providing automatic language identification 
for text-to-speech conversion is shown. The second 
method begins in step 500 in response to a subscriber s 
issuance of a text message review request, with the 
message inquiry unit 226 retrieving the subscriber's lan- 
guage preference settings. Next, in step 501, the mes- 
sage inquiry unit 226 retrieves a text message from the 
subscriber's in-box or from a data file or data folder as 
specified by the subscriber, and stores or copies the 
retrieved message into the message buffer 230. Follow- 
ing step 501 , the message inquiry unit 226 issues an 
identification directive to the trigraph analyzer 260 in 



step 502, thereby initiating language identification. Lan- 
guage identification is preferably performed in steps 504 
through 512 in an analogous manner to that described 
above in steps 304 through 312 of Figure 3. Successful 

5 language identification results when the trigraph ana- 
lyzer 260 returns a language identifier and a likelihood 
value greater than a minimum threshold value to the 
message inquiry unit 226. 

Upon receiving a language identifier and an accept- 

10 able likelihood value, the message inquiry unit 226 
selects the appropriate text translator 282, 283, 284, 
285, 286 and associated word dictionary 292, 293, 294, 
295, 296, and issues a translation directive in step 514, 
thereby performing the translation of the current text 

75 message into the target language given by the sub- 
scriber's language preference settings. Next, in step 
516, the message inquiry unit 226 issues a conversion 
directive to the text-to-speech engine 242, 243, 244, 
245, 246 that corresponds to the subscriber's language 

20 preference settings, causing the conversion of the 
translated text message to speech. The speech is pref- 
erably played to the subscriber in a conventional man- 
ner. Upon completion of step 516, the message inquiry 
unit 226 determines whether another text message in 

25 the subscriber's in-box or as specified by the subscriber 
requires consideration in step 520. If so, the preferred 
method proceeds to step 501 ; otherwise, the preferred 
method ends. 

TTiose skilled in the art will recognize that in the 

30 alternate embodiment, each word dictionary 292, 293, 
294, 295, 296 should include words that may be partic- 
ular to a given work environment in which the present 
invention may be employed. For example, use of the 
alternate embodiment in a computer-related business 

35 setting would necessitate word dictionaries 292, 293, 
294, 295, 296 that include computer-related terms to 
ensure proper translation. In general, the first and pre- 
ferred embodiment of the present invention is more 
robust and flexible than the second embodiment 

40 because direct conversion of text into speech, without 
intermediate text-to-text translation, is not constrained 
by the limitations of a word dictionary and is less sus- 
ceptible to problems arising from word spelling varia- 
tions. 

45 While the present invention has been described 
with reference to certain preferred embodiments, those 
skilled in the art will recognize that various modifications 
can be provided. For example, a language identification 
tool based upon techniques other than n-graph meth- 

so ods could be utilized instead of the trigraph analyzer 
260 and associated corecurrence libraries 272, 273, 
274, 275, 276. As another example, one or more text-to- 
speech engines 242, 243, 244, 245, 246 could be imple- 
mented via hardware, such as through "off-board" text- 

55 to-speech engines accessed through the use of remote 
procedure calls. As yet another example, converted 
speech data or translated text data could be stored for 
future use, which could be useful in a store-once, multi- 
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ple-playback environment. 

From the above it can be seen that the present 
invention is related to a unified messaging system 
includes a voice gateway server coupled to an elec- 
tronic mail system and a private branch exchange 
(PBX). The voice gateway server provides voice mes- 
saging services to a set of subscribers. Within the voice 
gateway server, a trigraph analyzer sequentially exam- 
ines 3 -character combinations within a text message; 
determines occurrence frequencies for the character 
combinations; compares the occurrence frequencies 
with reference occurrence statistics modeled from text 
samples written in particular languages; and generates 
a language identifier and a likelihood value for the text 
message. Based upon the language identifier, a mes- 
sage inquiry unit selects an appropriate text-to-speech 
engine for converting the text message into computer- 
generated speech that is played to a subscriber. 

The description herein provides for these and other 
variations upon the present invention, which is limited 
only by the following claims. 

Claims 

1. Method for language-based conversion of a text 
message into speech, preferably in a voice mes- 
saging system providing voice messaging services 
to a set of subscribers, comprising the steps of: 

retrieving a text message; 
automatically generating a language identifier 
corresponding to the text message; 
converting the text message into computer- 
generated speech based upon the language 
identifier; and 

playing the computer-generated speech to a 
subscriber. 

2. Voice messaging system providing voice messag- 
ing services to a set of subscribers, comprising 

means for retrieving a text message; 
means for automatically generating a language 
identifier corresponding to the text message; 
means for converting the text message into 
computer-generated speech based upon the 
language identifier; and 
means for playing the computer-generated 
speech to a subscriber. 

3. Voice messaging system according to claim 2, 
comprising 

a voice gateway server connectable to a com- 
puter network and a Private Branch Exchange 
(PBX), said voice gateway server facilitating 
the exchange of messages between computer 
users and a telephone system, and additionally 
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providing voice messaging services to sub- 
scribers, each of whom is preferably a compu- 
ter user identified in the e-mail directory, and/or 
wherein 

5 - the voice gateway server preferably comprises 

- a voice board, 

-- a network interface unit, 

- a processing unit, 

10 a data storage unit, and 

- a memory wherein 

— a set of voice messaging applica- 
tion units; 

15 — a message buffer; 

— a plurality of text-to-speech engines 
and corresponding phoneme libraries; 

— a trigraph analyzer; and 

— a set of corecurrence libraries 

20 

reside, with 

the computer network prefareably including a 
plurality of computers coupled to a file server, 
through which computer users identified in an 
25 electronic mail (e-mail) directory exchange 

messages. 

4. Voice messaging system according to claim 3, 
wherein 

30 

each voice messaging application unit com- 
prises program instructions for providing voice 
messaging functions such as call answering, 
automated attendant, and message store/for- 
35 ward operations to voice messaging subscrib- 

ers. 

5. Voice messaging system according to claim 2, 
wherein 

40 

a trigraph analyzer sequentially 

-- examines 3-character combinations 
within a text message; 

45 « determines occurrence frequencies for 

the character combinations; 
-- compares the occurrence frequencies 
with reference occurrence statistics mod- 
eled from text samples written in particular 

so languages; and 

- generates a language identifier and a 
likelihood value for the text message, and 
wherein 

55 - a message inquiry unit selects an appropriate 
text-to-speech engine for converting the text 
message into computer-generated speech that 
is played to a subscriber based upon the lan- 
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guage identifier. 

6. Voice messaging system according cairn 5. 
wherein 

5 

said message inquiry unit is provided for 

- directing message playback operations, 
and/or 

-- playing the subscriber's voice messages 10 
in a conventional manner in response to a 
subscriber's issuance of a voice message 
review request, and/or 

- initiating automatic language identifica- 
tion operations, followed by a text-to- is 
speech conversion performed in accord- 
ance with the results of the language iden- 
tification operations in response to a text 
message review request, 

20 

7. Voice messaging system according to claim 3, 
wherein 

the trigraph analyzer examines a text 
sequence, and 25 
performs language identification operations by 

-- first determining the occurrence frequen- 
cies of sequential 3-character combina- 
tions within the text, and 30 

- subsequently comparing the determined 
occurrence frequencies with reference 
occurrence statistics for various lan- 
guages. 

35 

8. Voice messaging system according to claim 7, 
wherein 

the set of reference occurrence statistics asso- 
ciated with a given language preferably being 40 
stored together as a corecurrence library. 

9. Voice messaging system according to claim 7, 
wherein 

45 

the trigraph analyzer 

- determines a closest match between the 
determined occurrence frequencies and a 
particular corecurrence library, and so 

- returns a corresponding language identi- 
fier and likelihood value to the message 
inquiry unit. 

10. Voice messaging system according to claim 9, ss 
wherein 

the message inquiry unit subsequently 



- selects a text-to-speech engine and an 
associated phoneme library, and 

- initiates the conversion of the text mes- 
sage into computer-generated speech that 
is played to the subscriber in a conven- 
tional manner. 
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